## SYSTEM-LEVEL POWER/ENERGY OPTIMIZATION

1. Sources of Power Dissipation
2. Reducing Power Consumption
3. System Level Power Optimization
4. Dynamic Power Management
5. Mapping and Scheduling for Low Energy
6. Real-Time Scheduling with Dynamic Voltage Scaling


## Why is Power Consumption an Issue?

- Portable systems: battery life time!
- Systems with limited power budget: Mars Pathfinder, autonomous helicopter, ...
- Desktops and servers: high power consumption
- raises temperature and deteriorates performance \& reliability
- increases the need for expensive cooling mechanisms
. One main difficulty with developing high performance chips is heat extraction.
- High power consumption has economical and ecological consequences.


## Sources of Power Dissipation in CMOS Devices

$$
P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}+I_{\text {leak }} \times V_{D D}
$$

C = node capacitances
Nsw = switching activities (number of gate transitions per clock cycle)
$\mathrm{f}=$ frequency of operation
$V_{D D}=$ supply voltage
$Q_{S C}=$ charge carried by short circuit current per transition
$I_{\text {leak }}=$ leakage current

## Sources of Power Dissipation in CMOS Devices

dynamic

$$
P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}+I_{\text {leak }} \times V_{D D}
$$

C = node capacitances
Nsw = switching activities (number of gate transitions per clock cycle)
$\mathrm{f}=$ frequency of operation
$V_{D D}=$ supply voltage
$Q_{S C}=$ charge carried by short circuit current per transition
$I_{\text {leak }}=$ leakage current

## Sources of Power Dissipation in CMOS Devices

$$
\begin{aligned}
& \text { dynamic } \\
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}+I_{\text {leak }} \times V_{D D} \\
& \text { Switching power Short-circ. power } \\
& \text { Power required to Dissipation due } \\
& \text { charge/discharge to short-circuit } \\
& \text { circuit nodes current } \\
& \text { C = node capacitances } \\
& \text { Nsw = switching activities } \\
& \text { (number of gate transi- } \\
& \text { tions per clock cycle) } \\
& \mathrm{f}=\text { frequency of operation } \\
& V_{D D}=\text { supply voltage } \\
& Q_{S C}=\text { charge carried by } \\
& \text { short circuit cur- } \\
& \text { rent per transition } \\
& l_{\text {leak }}=\text { leakage current }
\end{aligned}
$$

## Sources of Power Dissipation in CMOS Devices



## Sources of Power Dissipation in CMOS Devices

$$
P=\frac{\overbrace{\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}}^{\sum_{2}}+\underbrace{\begin{array}{ll}
Q_{S C} \times V_{D D} \times f \times N_{S W} & \text { Short-circ. power } \\
\text { Dissipation due } \\
\text { to short-circuit } & \text { current }
\end{array}}_{\begin{array}{l}
\text { Switching power } \\
\text { Power required to } \\
\text { charge/discharge } \\
\text { circuit nodes }
\end{array}}+\begin{array}{l}
\text { Leakage power } \\
\text { Lissipation } \\
\text { due to leakage } \\
\text { current }
\end{array}}{\text { steak } \times V_{D D}}
$$

- Earlier:

Leakage power has been considered negligible compared to dynamic.

- Today:

Total dissipation from leakage is approaching the total from dynamic.

- As transistor sizes shrink:

Leakage power becomes significant.

## Sources of Power Dissipation in CMOS Devices

$$
\begin{aligned}
& \text { dynamic } \\
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}+I_{\text {leak }} \times V_{D D} \\
& \text { Switching power Short-circ. power Leakage power } \\
& \text { Power required to Dissipation due Dissipation } \\
& \text { charge/discharge to short-circuit due to leakage } \\
& \text { circuit nodes } \\
& \text { current } \\
& \text { static } \\
& \text { current }
\end{aligned}
$$

- Leakage power is consumed even if the circuit is idle (standby). The only way to avoid is decoupling from power.


## Sources of Power Dissipation in CMOS Devices

$$
\begin{aligned}
& \text { dynamic } \\
& \text { static } \\
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}-I_{\text {leak }} \times V_{D D} \\
& \text { Switching power } \\
& \text { Power required to } \\
& \text { charge/discharge } \\
& \text { circuit nodes } \\
& \text { Short-circ. power Leakage power } \\
& \text { Dissipation due Dissipation } \\
& \text { to short-circuit due to leakage } \\
& \text { current } \\
& \text { current }
\end{aligned}
$$

- Leakage power is consumed even if the circuit is idle (standby). The only way to avoid is decoupling from power.
- Short circuit power is up to $10 \%$ of total.


## Sources of Power Dissipation in CMOS Devices

$$
\begin{aligned}
& \text { dynamic } \\
& P=\underbrace{\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}+Q_{S C} \times V_{D D} \times f \times N_{S W}+I_{\text {leak }} \times V_{D D}} \\
& \text { Switching power Short-circ. power Leakage power } \\
& \text { Power required to Dissipation due Dissipation } \\
& \text { charge/discharge to short-circuit due to leakage } \\
& \text { circuit nodes }
\end{aligned}
$$

- Leakage power is consumed even if the circuit is idle (standby). The only way to avoid is decoupling from power.
- Short circuit power can be around $10 \%$ of total.
- Switching power is still the main source of power consumption.


## Power and Energy Consumption

$$
\begin{aligned}
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W} \\
& E=P \times t=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
\end{aligned}
$$

$N_{C Y}=$ number of cycles needed for the particular task.

## Power and Energy Consumption

$P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W}$
$E=P \times t=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}$
$N_{C Y}=$ number of cycles needed for the particular task.

- In certain situations we are concerned about power consumption:
- heat dissipation, cooling:
- physical deterioration due to temperature.
- Sometimes we want to reduce total energy consumed:
- battery life.


## Power and Energy Consumption

$$
\begin{aligned}
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W} \\
& E=P \times t=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
\end{aligned}
$$

- Reducing power/energy consumption:
- Reduce supply voltage


## Power and Energy Consumption

$$
\begin{aligned}
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W} \\
& E=P \times t=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
\end{aligned}
$$

- Reducing power/energy consumption:
- Reduce supply voltage
- Reduce switching activity


## Power and Energy Consumption

$$
\begin{aligned}
& P=\frac{1}{2} \times C \not V_{D D}^{2} \times f \times N_{S W} \\
& E=P \times t=\frac{1}{2} \times C \nless V_{D D}^{2} \times N_{C Y} \times N_{S W}
\end{aligned}
$$

- Reducing power/energy consumption:
- Reduce supply voltage
- Reduce switching activity
- Reduce capacitance


## Power and Energy Consumption

$$
\begin{aligned}
& P=\frac{1}{2} \times C \times V_{D D}^{2} \times f \times N_{S W} \\
& E=P \times t=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
\end{aligned}
$$

- Reducing power/energy consumption:
- Reduce supply voltage
- Reduce switching activity
- Reduce capacitance
- Reduce number of cycles


## System Level Power/Energy Optimization

- Dynamic techniques: applied at run time.

These techniques are applied at run-time in order to reduce power consumption by exploiting idle or low-workload periods.

- Static techniques: applied at design time.
- Compilation for low power: instruction selection considering their power profile, data placement in memory, register allocation.
- Algorithm design: find the algorithm which is the most power-efficient.
- Task mapping and scheduling.


## System Level Power/Energy Optimization

Three techniques will be discussed:

1. Dynamic power management: a dynamic technique.
2. Task mapping: a static technique.
3. Task scheduling with dynamic power scaling: static \& dynamic.

## Dynamic Power Management (DPM)

| application |
| :---: |
| power aware OS |
| hardware |

## Dynamic Power Management (DPM)



Decisions:

- Switching among multiple power states:
- idle
- sleep
- run
- Switching among multiple frequencies and voltage levels.


## Dynamic Power Management (DPM)



## Decisions:

- Switching among multiple power states:
- idle
- sleep
- run
- Switching among multiple frequencies and voltage levels.

Goal:

- Energy optimization
- QoS constraints satisfied


## Dynamic Power Management (DPM)

Intel Xscale Processor

- RUN: operational
- IDLE: Clocks to the CPU are disabled; recovery is through interrupt.
- SLEEP: Mainly powered off; recovery through wake-up event.
- Other intermediate states: DEEP IDLE, STANDBY, DEEP SLEEP



## Dynamic Power Management (DPM)

Intel Xscale Processor

- RUN: operational
- IDLE: Clocks to the CPU are disabled; recovery is through interrupt.
- SLEEP: Mainly powered off; recovery through wake-up event.
- Other intermediate states: DEEP IDLE, STANDBY, DEEP SLEEP



## The Basic Concept of DPM

- When there are requests for a device $\rightarrow$ the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.


## The Basic Concept of DPM

- When there are requests for a device $\rightarrow$ the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.

| Workload Requests | Requests |  |
| :---: | :---: | :---: |
|  |  |  |
|  | $T_{1}$ | $T_{4}$ |

## The Basic Concept of DPM

- When there are requests for a device $\rightarrow$ the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.



## The Basic Concept of DPM

- When there are requests for a device $P$ the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.



## The Basic Concept of DPM

- When there are requests for a device P the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.

- Changing the power state takes time and extra energy.
- $\mathrm{T}_{\text {sd }}$ : shutdown delay
- $\mathrm{T}_{\text {wu }}$ : wake-up delay

Send the device to sleep only if the saved energy justifies the overhead!

## The Basic Concept of DPM

- When there are requests for a device $P$ the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.

- The main Problems:
- Don't shut down such that delays occur too frequently.
- Don't shut down such that the savings due to the sleeping are smaller than the energy overhead of the state changes.


## Power Management Policies

- When there are requests for a device P the device is busy; otherwise it is idle.
- When the device is idle, it can be shut down to enter a low-power sleeping state.

| Workload | Requests |  | Requests |
| :---: | :---: | :---: | :---: |
| Device state | Busy | Idle | Busy |
| Power state | Working | $T_{\text {sd }} \quad$ Sleeping | Twu Working |
|  |  |  |  |
|  | T | 1 | 4 |

- Power management policies are concerned with predictions of idle periods:
- For shut-down: try to predict how long the idle period will be in order to decide if a shut-down should be performed.
- For wake-up: try to predict when the idle period ends, in order to avoid user delays due to $\mathrm{T}_{\text {wu }}$ - Very difficult!


## Time-out Policy

- It is assumed that, after a device is idle for a period $t$, it will stay idle for at least a period which makes it efficient to shut down.

| Workload | Requests |  | Requests |
| :---: | :---: | :---: | :---: |
| Device state | Busy | Idle | Busy |
| Power state | Working | Tsd Sleeping | Twu Working |
|  |  |  |  |
|  |  | $\mathrm{T}_{2} \mathrm{~T}_{3}$ | 4 |

## Time-out Policy

- It is assumed that, after a device is idle for a period $t$, it will stay idle for at least a period which makes it efficient to shut down.
- Drawback: you waste energy during the period t (compared to instantaneous shut-down).

- Policies:
- Fixed time-out period: you set the value of $t$, which stays constant.
- Adjusted at run-time: increase or decrease $t$, depending on the length of previous idle periods.


## Predictive Policy

- The length of an idle period is predicted. If the predicted idle period is long enough, the shut-down is performed immediately (time interval $t=0$ ).



## Example: A Very Simple Predictive Policy

- This is just a very particular example! This policy has been proposed for a very particular application, after intensive experiments. This policy might not work for any other application!


## Example: A Very Simple Predictive Policy

- This is just a very particular example! This policy has been proposed for a very particular application, after intensive experiments. This policy might not work for any other application!
- Measurements on the particular application, have shown an L-shaped distribution for: $\frac{\text { Idle Period }}{\text { Previous Busy Period }}$



## Example: A Very Simple Predictive Policy

- 

This is just a very particular example! This policy has been proposed for a very particular application, after intensive experiments. This policy might not work for any other application!

- Measurements on the particular application, have shown an L-shaped distribution for: $\frac{\text { Idle Period }}{\text { Previous Busy Period }}$


Shut down after short (< q) busy period!

## Advanced Predictive Policies

- In the previous example the authors were very lucky!
- One single application is running on the platform.
- By profiling, they were able to draw a very simple conclusion regarding the run-time behaviour, expressed by that "L-shape" diagram.
- Most often the situation is not that simple:
- We do not know in advance all application running on the system;
- The behaviour of the applications changes during run-time, depending on environment and input data.

More advanced run-time prediction techniques have to be applied like e.g. based on statistics, stochastic modelling, and machine learning

## Dynamic Power Management (DPM)

- For many embedded systems DPM techniques, like presented before, are not appropriate:
- They have time constraints $\rightarrow$ we have to keep deadlines (usually we cannot afford shut-down and wake-up times).
- The OS is simple\&fast $\rightarrow$ no sophisticated run-time techniques.
- The application is known at design time $\rightarrow$ we know a lot about the application and optimize already at design time.


## Mapping for Low Energy



Platform with two microprocessors mp3 and mp4, and a communication bus

## Mapping for Low Energy



## Mapping for Low Energy




| $m p 3$ | $\mathrm{t}_{1}$ | $\mathrm{t}_{3}$ | $\mathrm{t}_{6}$ | $\mathrm{t}_{7}$ |
| :---: | :---: | :---: | :---: | :---: |
|  |  | $\mathrm{t}_{8}$ |  |  |



## Mapping for Low Energy



## Execution time: 52; Energy consumed: 75




## Mapping for Low Energy



Time $\stackrel{0}{\perp}$


## Mapping for Low Energy



Execution time: 57; Energy consumed: 70


| $\operatorname{mp3}$ | $\mathrm{t}_{1}$ | $\mathrm{t}_{3}$ | $\mathrm{t}_{6}$ |  |  |  |  | $\mathrm{t}_{7}$ |
| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| mp4 | $\square \mathrm{t}_{2}$ | $\mathrm{t}_{5}$ | $\mathrm{t}_{4}$ | $\square$ |  |  |  |  |
| bus | $\square$ | $\square$ | $\square$ | $\mathrm{t}_{8}$ |  |  |  |  |
|  | $\square$ | $\square$ | $\square$ |  |  |  |  |  |

## Mapping for Low Energy



| Tas <br> k | WCET |  | Energy |  |
| :---: | :---: | :---: | :---: | :---: |
|  | mp 3 | mp 4 | mp 3 | mp 4 |
| $\mathrm{t}_{1}$ | 5 | 6 | 5 | 3 |
| $\mathrm{t}_{2}$ | 7 | 9 | 8 | 4 |
| $\mathrm{t}_{3}$ | 5 | 6 | 5 | 3 |
| $\mathrm{t}_{4}$ | 8 | 10 | 6 | 4 |
| $\mathrm{t}_{5}$ | 10 | 11 | 8 | 6 |
| $\mathrm{t}_{6}$ | 17 | 21 | 15 | 10 |
| $\mathrm{t}_{7}$ | 10 | 14 | 8 | 7 |
| $\mathrm{t}_{8}$ | 15 | 19 | 14 | 9 |

- The second mapping with $\mathrm{t}_{8}$ on mp4 consumes less energy;
- Assume that we have a maximum allowed delay $=60$.

This second mapping is preferable, even if it is slower!

## Real-Time Scheduling with Dynamic Voltage Scaling

- The energy consumed by a task, due to switching power:

$$
E=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
$$

$N_{\text {SW }}=$ number of gate transitions per clock cycle.
$N_{C Y}=$ number of cycles needed for the task.

## Real-Time Scheduling with Dynamic Voltage Scaling

- The energy consumed by a task, due to switching power:

$$
E=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W} \quad \begin{array}{ll}
\text { N } & \mathbf{N}_{\mathrm{CY}}=\text { number of gate transitions per clock cycle } .
\end{array}
$$

- Reducing supply voltage $V_{D D}$ is the efficient way to reduce energy consumption.
- The frequency at which the processor can be operated depends on $\mathrm{V}_{\mathrm{DD}}$ :

$$
f=k \times \frac{\left(V_{D D}-V_{t}\right)^{2}}{V_{D D}}, \boldsymbol{k}: \text { circuit dependent constant; } \boldsymbol{V}_{\boldsymbol{t}}: \text { threshold voltage. }
$$

## Real-Time Scheduling with Dynamic Voltage Scaling

- The energy consumed by a task, due to switching power:

$$
E=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
$$

$N_{\text {SW }}=$ number of gate transitions per clock cycle.
$N_{C Y}=$ number of cycles needed for the task.

- Reducing supply voltage $\mathrm{V}_{\mathrm{DD}}$ is the efficient way to reduce energy consumption.
- The frequency at which the processor can be operated depends on $\mathrm{V}_{\mathrm{DD}}$ :

$$
f=k \times \frac{\left(V_{D D}-V_{t}\right)^{2}}{V_{D D}}, \boldsymbol{k}: \text { circuit dependent constant; } \boldsymbol{V}_{\boldsymbol{t}}: \text { threshold voltage. }
$$

- The execution time of the task: $t_{\text {exe }}=N_{C Y} \times \frac{V_{D D}}{k \times\left(V_{D D}-V_{t}\right)^{2}}$ Depends on $V_{D D!}$


## Real-Time Scheduling with Dynamic Voltage Scaling

- The (classical) scheduling problem:

Which task to execute at a certain moment on a certain processor so that time constraints are fulfilled?

## Real-Time Scheduling with Dynamic Voltage Scaling

- The (classical) scheduling problem:

Which task to execute at a certain moment on a certain processor so that time constraints are fulfilled?

- The scheduling problem with voltage scaling:

Which task to execute at a certain moment on a certain processor, and at which voltage level, so that time constraints are fulfilled and energy consumption is minimised?

## Real-Time Scheduling with Dynamic Voltage Scaling

- The (classical) scheduling problem:

Which task to execute at a certain moment on a certain processor so that time constraints are fulfilled?

- The scheduling problem with voltage scaling:

Which task to execute at a certain moment on a certain processor, and at which voltage level, so that time constraints are fulfilled and energy consumption is minimised?

- The problem: reducing supply voltage extends execution time!


## Variable Voltage Processors



## Variable Voltage Processors



- Several supply voltage levels are available.
- Supply voltage can be changed during run-time.
- Frequency is adjusted to the current supply voltage.


## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- energy: $40 \mathrm{~nJ} / \mathrm{cycle}$ at nominal voltage.
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage.


## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage.
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage.

| $V^{2}$ |
| :---: | :---: | :---: | :---: |

## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $2.5 \mathrm{~V}: 40 \times 2.5^{2} / 5^{2}=10 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).



## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $2.5 \mathrm{~V}: 40 \times 2.5^{2} / 5^{2}=10 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).



## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $2.5 \mathrm{~V}: 40 \times 2.5^{2} / 5^{2}=10 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).

Let's try a different solution!

## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $4 \mathrm{~V}: 40 \times 4^{2} / 5^{2}=25 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$.



## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $4 \mathrm{~V}: 40 \times 4^{2} / 5^{2}=25 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$.



## The Basic Principle

- We consider a single task t:
- total computation: $10^{9}$ execution cycles.
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $4 \mathrm{~V}: 40 \times 4^{2} / 5^{2}=25 \mathrm{~nJ} /$ cycle



## The Basic Principle

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: \mathbf{2 5 0 \times 1 0 ^ { 6 }}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $4 \mathrm{~V}: 40 \times 4^{2} / 5^{2}=25 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$.



## The Basic Principle

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: \mathbf{2 5 0 \times 1 0 ^ { 6 }}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5V.
- energy: $40 \mathrm{~nJ} /$ cycle at nominal voltage; at $4 \mathrm{~V}: 40 \times 4^{2} / 5^{2}=25 \mathrm{~nJ} /$ cycle
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$.



## Considering Task Particularities

- Energy consumed by a task:
$N_{\text {SW }}=$ number of gate transitions per clock cycle.

$$
E=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}
$$

C = switched capacitance per clock cycle.

- Average energy consumed by task per cycle:

$$
E_{C Y}=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{S W}
$$

- Often tasks differ from each other in terms of executed operations $\rightarrow$ $N_{S W}$ and C differ from one task to the other.


The average energy consumed per cycle differs from task to task.

## Considering Task Particularities

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: 250 \times 10^{6}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$. at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).
- Energy $\mathrm{t}_{1}$ $50 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $32 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$. $12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$.
- Energy $\mathrm{t}_{2}$
$12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $8 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$. $3 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$.



## Considering Task Particularities

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: 250 \times 10^{6}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$. at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).
- Energy $\mathrm{t}_{1}$ $50 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $32 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$.
$12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$.
V2 $250 \times 10^{6}$


## Considering Task Particularities

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: 250 \times 10^{6}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$. at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).
- Energy $\mathrm{t}_{1}$ $50 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $32 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$.
$12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$.
$\mathrm{V}^{2} \quad 250 \times 10^{6}$


## Considering Task Particularities

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: 250 \times 10^{6}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$. at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).
- Energy $\mathrm{t}_{1}$ $50 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $32 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$.
$12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$.
$\mathrm{V}^{2}$



## Considering Task Particularities

- We consider two tasks $t_{1}$ and $t_{2}$ :
- Computation $\mathrm{t}_{1}: 250 \times 10^{6}$ execution cycles; $\mathrm{t}_{2}: 750 \times 10^{6}$ execution cycles
- deadline: 25 seconds.
- processor nominal (maximum) voltage: 5 V .
- processor speed: $50 \mathrm{MHz}\left(50 \times 10^{6}\right.$ cycles/sec) at nominal voltage; at $4 \mathrm{~V}: 50 \times 4 / 5=40 \mathrm{MHz}\left(40 \times 10^{6}\right.$ cycles $\left./ \mathrm{sec}\right)$. at $2.5 \mathrm{~V}: 50 \times 2.5 / 5=25 \mathrm{MHz}\left(25 \times 10^{6}\right.$ cycles/sec).
- Energy $\mathrm{t}_{1}$ $50 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=5 \mathrm{~V}$. $32 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=4 \mathrm{~V}$.
$12.5 \mathrm{~nJ} /$ cycle at $\mathrm{V}_{\mathrm{DD}}=2.5 \mathrm{~V}$. $\mathrm{V}^{2}$



## Considering Task Particularities

- If power consumption per cycle differs from task to task the "basic principle" is no longer true!

Voltage levels have to be reduced with priority for those tasks which have a larger energy consumption per cycle.

- One individual voltage level has to be established for each task, so that deadlines are just satisfied.


## Discrete Voltage Levels

- Practical microprocessors can work only at a finite number of discrete voltage levels.

The "ideal" voltage $\mathrm{V}_{\text {ideal }}$, determined for a certain task does not exist.

## Discrete Voltage Levels

- Practical microprocessors can work only at a finite number of discrete voltage levels.


The "ideal" voltage $\mathrm{V}_{\text {ideal }}$, determined for a certain task does not exist.

- A task is supposed to run for time $t_{\text {exe }}$ at the voltage $\mathbf{V}_{\text {ideal }}$.

On the particular processor the two closest available neighbours to $\mathrm{V}_{\text {ideal }}$ are: $\mathrm{V}_{\mathbf{1}}<\mathrm{V}_{\text {ideal }}<\mathrm{V}_{\mathbf{2}}$.


You have minimised the energy if you run the task for time $t_{1}$ at voltage $\mathbf{V}_{\mathbf{1}}$ and for $t_{2}$ at voltage $V_{2}$, so that $t_{1}+t_{2}=t_{\text {exe }}$.

## The Pitfalls with Ignoring Leakage



## The Pitfalls with Ignoring Leakage

dynamic
leakage

$$
E=\frac{1}{2} \times C \times V_{D D}^{2} \times N_{C Y} \times N_{S W}+L_{g} \times\left(V_{d d} \times K_{3} \times e^{K_{4} \times V_{d d}} \times e^{K_{5} \times V_{b s}}+\left|V_{b s}\right| \times I_{j u}\right) \times t
$$

C = node capacitances
$N_{\text {SW }}=$ switching activities (number of gate transitions per clock cycle)
$N_{C Y}=$ number of cycles needed for the task.
f $=$ frequency of operation
$\mathrm{V}_{\mathrm{DD}}=$ supply voltage
$\mathrm{K}_{3 . .5}=$ technology dependent constants
$L_{g} \quad=$ number of gates
$\mathrm{V}_{\mathrm{bs}}=$ body-bias voltage
$\mathrm{I}_{\mathrm{ju}} \quad=$ body junction leakage current

## The Pitfalls with Ignoring Leakage


ignore the rest!

## The Pitfalls with Ignoring Leakage



1. We don't optimize global energy but only a part of it!
2. We can get it even very wrong and increase energy consumption!

## The Pitfalls with Ignoring Leakage



## The Pitfalls with Ignoring Leakage



## The Pitfalls with Ignoring Leakage



## The Pitfalls with Ignoring Leakage



## The Pitfalls with Ignoring Leakage



